Tracking the Lexical Zeitgeist with WordNet and Wikipedia

نویسنده

  • Tony Veale
چکیده

Most new words, or neologisms, bubble beneath the surface of widespread usage for some time, perhaps even years, before gaining acceptance in conventional print dictionaries [1]. A shorter, yet still significant, delay is also evident in the life-cycle of NLP-oriented lexical resources like WordNet [2]. A more topical lexical resource is Wikipedia [3], an open-source community-maintained encyclopedia whose headwords reflect the many new words that gain recognition in a particular linguistic sub-culture. In this paper we describe the principles behind Zeitgeist, a system for dynamic lexicon growth that harvests and semantically analyses new lexical forms from Wikipedia, to automatically enrich WordNet as these new word forms are minted. Zeitgeist demonstrates good results for composite words that exhibit a complex morphemic structure, such as portmanteau words and formal blends [4, 5].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zeitgeist: A Computational Model of Neologism Processing

Language is a dynamic landscape in which words are not fixed landmarks, but fickle signposts that switch their directions as archaic senses are lost and new, more topical senses, are gained. Frequently, entirely new lexical signposts are added as newly minted word-forms enter the language. Some of these new forms are cut from whole cloth and have their origins in creative writing, movies or gam...

متن کامل

Extracting Lexical Reference Rules from Wikipedia

This paper describes the extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms. We present extraction methods geared to cover the broad range of the lexical reference relation and analyze them extensively. Most extraction methods yield high precision levels, and our rule-base is shown to perform better than other automatically con...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia

This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automat...

متن کامل

Wordventure – developing WordNet in Wikipedia-like style

The article describes an approach for building WordNet semantic dictionary in a collaborative way. The idea of gathering lexical data has been proposed, as well as the system for linguistic data acquisition and management.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006